## Constructive Computer Architecture:

## Non-Pipelined Processors - 2

### Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

October 4, 2017

## Single-Cycle Implementation

### code structure

#### module mkProc(Proc); Reg#(Addr) pc <- mkRegU;</pre> rf <- mkRFile; RFile instantiate the state IMemory iMem <- mkIMemory;</pre> DMemory dMem <- mkDMemory; rule doProc; let inst = iMem.req(pc); extracts fields **let** dInst = decode (inst); needed for let rVal1 = rf.rd1(dInst.rSrc1); execution let rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, pc); produces values update rf, pc and dMem needed to update the processor state October 4, 2017 http://csg.csail.mit.edu/6.175 L11-2

## Single-Cycle RISC-V atomic state

updates

if(isValid(eInst.dst)) // Register write
 rf.wr(fromMaybe(?, eInst.dst), eInst.data);

pc <= eInst.brTaken ? eInst.addr : pc + 4;</pre>

 endrule
 state updates

 endmodule
 The whole processor is described using one rule;

 lots of big combinational functions
 http://csg.csail.mit.edu/6.175



October 4, 2017

http://csg.csail.mit.edu/6.175

L11-4

# Instructions to Read and Write CSR

6.175 convention uses a CSR (mtohost) to communicate with the host

| 12  | 5   | 3      | 5  | 7      |
|-----|-----|--------|----|--------|
| CST | rs1 | funct3 | rd | opcode |

- opcode = SYSTEM
- CSRW rs1, csr (funct3 = CSRRW, rd = x0): csr ← rs1
- CSRR csr, rd (funct3 = CSRRS, rs1 = x0): rd  $\leftarrow$  csr
- New enums in IType: Csrr, Csrw

typedef Bit#(12) CsrIndx; // CSR index is 12-bit

CSR is needed as an additional field in DecodedInst and ExecInst types

```
Maybe#(CsrIndex) csr;
```

October 4, 2017

## Code with CSRs

// csrf: module that implements all CSRs
let csrVal = csrf.rd(fromMaybe(?, dInst.csr));
let eInst = exec(dInst, rVal1, rVal2, pc, csrVal);

pass CSR values to execute CSRR

csrf.wr(eInst.iType == Csrw ? eInst.csr : Invalid, eInst.data);

write CSR (CSRW instruction) and indicate the completion of an instruction

We did not show these lines in our processor to avoid cluttering the slides

October 4, 2017

# Communicating with the host

We will provide you C library functions like print, which use CSR to communicate with the host; you will almost never encode anything directly to communicate with the host



## Structural Hazards

- Sometimes multicycle implementations are necessary because of resource conflicts, aka, structural hazards
  - Princeton style architectures use the same memory for instruction and data and consequently, require at least two cycles to execute Load/Store instructions
  - If the register file supported less than 2 reads and one write concurrently then most instructions would take more than one cycle to execute

 Usually extra registers are required to hold values between cycles



L11-10

## **Two-Cycle RISC-V**

#### module mkProc(Proc);

- Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile;</pre>
- IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory;</pre>
- Reg#(Data) f2d <- mkRegU;</pre>
- Req#(State) state <- mkReq(Fetch);</pre>
- rule doFetch (state == Fetch); If state is Fetch then fetch
- **let** inst = iMem.req(pc); the instruction and put it in f2d <= inst; f2d, and change the state to
  - state <= Execute;</pre>

#### endrule

rule doExecute(stage==Execute); If state is Execute then let inst = f2d; execute the instruction in f2d, and change the state to Fetch

Execute

- let dInst = decode(inst); ... Copy the code from slides 2 and 3 ...
- pc <= eInst.brTaken ? eInst.addr : pc + 4;</pre>
- state <= Fetch;</pre>
- endrule endmodule

## Two-Cycle RISC-V: Analysis



*Pipeline execution of instructions to increase the throughput* 

October 4, 2017

# Problems in Instruction pipelining



- Control hazard: Inst<sub>i+1</sub> is not known until Inst<sub>i</sub> is at least decoded. So which instruction should be fetched?
- Structural hazard: Two instructions in the pipeline may require the same resource at the same time, e.g., contention for memory
- Data hazard: Inst<sub>i</sub> may affect the state of the machine (pc, rf, dMem) Inst<sub>i+1</sub>must be fully cognizant of this change

none of these hazards were present in the FFT pipeline

October 4, 2017

# Arithmetic versus Instruction pipelining

The data items in an arithmetic pipeline, e.g.,
 FFT, are independent of each other



In processors, older instructions in the pipeline may affect the younger ones

- This causes pipeline stalls or requires other fancy tricks to avoid stalls
- Processor pipelines are significantly more complicated than arithmetic pipelines

October 4, 2017

## Hazards can't be wished away

The power of computers comes from the fact that the instructions in a program are *not* independent of each other

 $\Rightarrow$  must deal with hazard

## **Control Hazards**



Inst<sub>i+1</sub> is not known until Inst<sub>i</sub> is at least decoded. So which instruction should be fetched?

- General solution speculate, i.e., predict the next instruction address
  - requires the next-instruction-address prediction machinery; can be as simple as pc+4
  - prediction machinery is usually elaborate because it dynamically learns from the past behavior of the program
- What if speculation goes wrong?
  - machinery to kill the wrong-path instructions, restore the correct processor state and restart the execution at the correct pc

## **Two-stage Pipelined RISC-V**



October 4, 2017

http://csg.csail.mit.edu/6.175

L11-17

## Pipelining Two-Cycle RISC-V

## Synchronous Pipeline, singlerule



Fetch and Execute are concurrently active on two different instructions

#### rule doPipeline ;

Fetch phase -

fetch an instruction to be put into register ir; and

guess the next pc

Execute phase -

execute the instruction in ir if it has a valid one;

determine if the next pc is what we had guessed;

if the guess was correct then assign the newly fetched instruction and the guessed pc into ir and pc, respectively;

if the guess was incorrect then put Invalid in ir and the correct pc into

the pc

endrule

October 4, 2017

## Pipelining Two-Cycle RISC-V

synchronous pipeline, i.e., singlerule



## Elastic two-stage pipeline



- We replace f2d register by a FIFO to make the machine more elastic, that is, Fetch keeps putting instructions into f2d and Execute keeps removing and executing instructions from f2d
- Fetch passes the pc and predicted pc in addition to the inst to Execute; Execute redirects the PC in case of a miss-prediction

October 12, 2016